TRAVEL TO PRIMARY ELECTION FOR 2016 Election contribution in Alabama by Jay Cheong
## [1] "/Users/JAY/Desktop/Udacity/Project4/Project4_R/PROJECT4"
## cmte_id cand_id cand_nm
## Length:23645 P60006111:7550 Cruz, Rafael Edward 'Ted':7550
## Class :character P60007168:5446 Sanders, Bernard :5446
## Mode :character P60005915:4372 Carson, Benjamin S. :4372
## P00003392:3519 Clinton, Hillary Rodham :3519
## P60006723:1142 Rubio, Marco :1142
## P40003576: 336 Paul, Rand : 336
## (Other) :1280 (Other) :1280
## contbr_nm contbr_city contbr_st
## GUEVARA, MARIETTA S. DR. : 253 BIRMINGHAM: 3295 AL:23645
## GUEVARA, MARIETTA S. S. DR.: 108 HUNTSVILLE: 2219
## FORREST, ERIC : 102 MOBILE : 1255
## CHILDERS, DICEY S. MRS. : 86 MONTGOMERY: 1056
## HAMILTON, LAURA : 78 MADISON : 951
## MARKY, DONNA : 78 TUSCALOOSA: 595
## (Other) :22940 (Other) :14274
## contbr_zip contbr_employer
## Min. : 71 RETIRED : 6020
## 1st Qu.:352166810 NONE : 1321
## Median :357564271 N/A : 1193
## Mean :343661041 SELF EMPLOYED: 1135
## 3rd Qu.:361177552 SELF-EMPLOYED: 819
## Max. :369167108 (Other) :13129
## NA's :3 NA's : 28
## contbr_occupation contb_receipt_amt
## RETIRED : 6616 Min. :-5400
## NOT EMPLOYED : 1696 1st Qu.: 25
## HOMEMAKER : 641 Median : 40
## PHYSICIAN : 615 Mean : 134
## INFORMATION REQUESTED PER BEST EFFORTS: 608 3rd Qu.: 100
## (Other) :13467 Max. :10800
## NA's : 2
## contb_receipt_dt receipt_desc
## 29-FEB-16: 457 :23055
## 31-MAR-16: 332 REDESIGNATION TO GENERAL : 169
## 05-APR-16: 315 REDESIGNATION FROM PRIMARY : 166
## 31-DEC-15: 308 Refund : 79
## 05-MAR-16: 265 REATTRIBUTION / REDESIGNATION REQUESTED: 35
## 30-APR-16: 256 REATTRIBUTION FROM SPOUSE : 32
## (Other) :21712 (Other) : 109
## memo_cd memo_text form_tp
## :22666 :17236 SA17A:23064
## X: 979 * EARMARKED CONTRIBUTION: SEE BELOW: 5182 SA18 : 502
## * HILLARY VICTORY FUND : 496 SB28A: 79
## REDESIGNATION TO GENERAL : 169
## REDESIGNATION FROM PRIMARY : 166
## EARMARKED FROM MAKE DC LISTEN : 98
## (Other) : 298
## file_num tran_id election_tp
## Min. :1003942 ADA11F117B690406CBE7: 3 : 6
## 1st Qu.:1056899 C2044921 : 2 G2016: 278
## Median :1057795 C2831133 : 2 P2016:23361
## Mean :1059548 C3903522 : 2
## 3rd Qu.:1066824 C3915448 : 2
## Max. :1074038 C3944632 : 2
## (Other) :23632
## NA modifiedDate
## Mode:logical Min. :2014-12-22
## NA's:23645 1st Qu.:2015-11-11
## Median :2016-02-04
## Mean :2016-01-07
## 3rd Qu.:2016-03-14
## Max. :2016-04-30
##
23645 people choice recorded for election 23 columns information saved
Univariate Plots Section

The result presented Top five candidates by contribution count. Ted Cruz was the first place and Bernard Sanders was the second place.Clinton Hillary took 4th place
Then, Let’s see the economical index in AL, where is the most popular place to live in AL and how much they contributed for their candidate.

This plot shows how much each candidate were contributed and the order based on the popularity level. Ted took the first place as expected but sanders was second place in popularity but he didn’t get much amount of money in the election even less than hillary, rubio.

Plot is for contributor’s job. what kinds of people supported their candidate. Mostly, retired people supported a lot more than any other employees. it is kind of surprising.

Adjust several different bins of x scale of date. I could figure out end of each month contribution frequency was getting peak. As is on the first plot, I could know mostly much of the contributed money from end of every month. I am not sure why but I got my paystub almost end of month. So, they have money enough to send their candidate.
Univariate Analysis
What is the structure of your dataset?
There are 23 variables and 23645 objects. some of them are useful data.For example,There are candidate name, contributed cost, receipt date, contributor city, contributor job, contributor company etc.. and those data I mentioned above is useful for analyze it. the others like id, memo was not necessarily included in the analysis.
A lot Contributed >>>>>>>>>>>>>>>>>>>>>>>>>>> A few Contributed
Candidate : Ted cruze, Sanders, Carson, Hillary, Rubio… Stein
Year : 2016,2015
City : Birmingham, huntsville, Mobile, Montgomery….etc
What is/are the main feature(s) of interest in your dataset?
I dont know politics well but just heard that rich people support hillary and the other side supports Sanders. Mostly, I will figure out the relation between cost and other variables such like candidate, contributor, city.
What other features in the dataset do you think will help support your investigation into your feature(s) of interest?
Candidate, Contributor, Cost, City, Receipt date, Contributor company, Contributor Position
Did you create any new variables from existing variables in the dataset?
Yes, I created Modified Date, Year, Month, Day.
Bivariate Plots Section

Picture above is for 2015 and 2016 monthly contribution frequency. 2016 people contributed a lot more than 2015 because the election is coming almost around the corner
## [1] BUGGAY, DAVID S. MR. MEISLER, HERBERT A. MR.
## [3] HIPP, GEORGE KYNERD, KEVIN B. MR.
## [5] ASH, LANDON EDWIN MR. BUGGAY, DANETTE
## [7] KENDRICK, MICHAEL SCOTT MR. MATHIS, CHAD E. DR.
## [9] RANKIN, JOHN P. SMITH, CHADWICK
## 5165 Levels: AARON, JOAN AARON, JOAN J. MRS. ... ZWAHLEN, RENE
## [1] STEWART, JIMMY SUDDERTH, JOHN WHITMAN, JOHN
## [4] CARRIO, CLARENCE CAMPBELL, JUDITH COUSINS, JOHN MR.
## [7] DARBY, DEANA JENKINS, KRYSTAL PRINCE, BONNIE
## [10] NICHOLS, DAKOTA
## 5165 Levels: AARON, JOAN AARON, JOAN J. MRS. ... ZWAHLEN, RENE

Changed the raw data to subset of contribution amount more than $0 and removed the refunded amount of money from the data. I wanted to see countribution amount of 10 people from the top and from the bottom. top 10 people are almost 1000~3000 times much more amount of contribution than 10 people from the bottom

Ted cruz was supported the most but average contribution amount is less than $100. on the contrary, Jeb bush was contributed from the 5th on the list but the average amount is around $1,000 and the median value seems like $250. maybe rich people like to support Bush.
Ted Cruze earned the most highest campaign contribution in AL but surprisingly, Hillary Clinton was 4th place on the popularity but she earned the second highest campaign contribution expense in AL. It explains small number of people contributed a lot of money for her. Sanders was No.3 in popularity plot but took over 6th place on the contributed money plot. It meant lots of people supported him but not actually contributed big money to him. So I could expect that might be true that rich people supported hillary and poor people supported Sanders. Trump is out of the rank in both plots

High percentage of contribution amount stays below $1,000 and next strip is around $1,000 and small strip around $2,000 Next strip looks like around $3,000 and a few of line around $6,000.

Retired people mostly contributed a lot of money on the election. Clinton and Ted cruz are getting contributed as time goes by. Sanders and Rubio doesn’t seem like contributed steadily.

Daily mean and sum/10 contribution amount. just wanted to try other plots.
nothing special can be noticed. mean value is almost similar of all time but summation of the contribution getting increased. so we could think people are participating more contribution for their candidate.

Popularity and Mean value. Ted Cruze got contribution a lot compared to mean costs even though the amount of cost was small he got lots of contribution. As popularity goes lower, Mean value normally higher
## [1] "Cruz, Rafael Edward 'Ted'" "Sanders, Bernard"
## [3] "Carson, Benjamin S." "Clinton, Hillary Rodham"
## [5] "Rubio, Marco"

As expected earlier, Ted and Benjamin got a lot of money. a few of them are over $10,000. Benjamin is mostly higher than the other four candidate. Other four candidate almost same amount of 75% of offset average

Bivariate Analysis
Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?
candidate and contribution date was kind of interesting. I could track how some candidate popularity changed by time even though it could be indirect. Contribution for Ted Cruze keep increasing on the plot as well as hillary in retired people
also contribution amount and date was interesting. it could be possible to check how much contribution generally supported for a candidate. There were like stripe around below $800, $1,000 and $3,000
Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?
when I look through the contribution from Retired people. Mean contribution amount fluctuated a lot in the beginning of 2015 and getting converged in the beginning of 2016 even though total amount of contribution increased. people supported small amount of money but more people contributed for a candidate.
What was the strongest relationship you found?
The big cities like birmingham, mobile, montgomey lived a lot more people than other countryside so that made contributed lots of money to a candidate they are supporting. also considering retired people dominated the most of contribution. we could say order people have an more interest on a politics.
Multivariate Plots Section
## [1] Sanders, Bernard Cruz, Rafael Edward 'Ted'
## [3] Rubio, Marco Clinton, Hillary Rodham
## [5] Carson, Benjamin S.
## 21 Levels: Cruz, Rafael Edward 'Ted' < ... < Stein, Jill

montgomery, Tuscaloosa supported democratic party more than republican and Hillary was supported the most in those cities. In other cities, mostly republican candidates were contributed highly.

In Democrat, Sanders mostly more often contributed than hillary even though the amounts were small. In Republican, Ted cruze dominated the contribution frequency mostly all other cities.

2015, December Birmingham and huntsville contributed more than normal
In 2016, April seems like less than expected. it’s because the data collected in the middle of the month I guess.

Monthly mean value for top five candidates
Benjamin and Sanders steadily got contributed from the beginning before Mar 2016. Rubio Marco high amount of mean amount from Nov 2015 Opposingly Hillary got high amount contribution efore 2015 Nov.
Multivariate Analysis
Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?
There are some trend between the date and candidate contribution amount.
Monthly mean value for top five candidates
Benjamin and Sanders steadily got contributed from the beginning before Mar 2016. Rubio Marco high amount of contribution suddenly increased from Nov 2015 Opposingly Hillary got high amount contribution before 2015 Nov and getting small amount of money after then. Between Oct2015 and Dec2015 for these three months. trend changed a lot.
Were there any interesting or surprising interactions between features?
Pupularity and mean contribution amount is not necessarily same. even though Ted highly supported in Alabama. he only got small amount of mean contribution. Bush Jep opposedely supported by small people but average contribution amount is the top. funny thing is as many people support their candidate, normally the mean contribution amount is small. I just curious about the relation ship between popularity and total contribution amount.
OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.
Final Plots and Summary

Description One
There are relationship between the candidate popularity and contribution amount for sure. just doesn’t match well for some candidate. Normally, we could think if there are many people support a candidate, the candidate will get more contribution which is true. As you can check left and right plot. Mostly, high ranked candidates who received a lot of contribution are popular in Alabama except for Bush Jeb.

Description Two
Easily noticed five popular candidate year long. Sanders, Rubio, Cruz, Hillary, Benjamin from the top. Interesting thing is End of April, It looks like three candidates left on the election, Sanders, Cruz, Hillary, which are not true. There still three more candidates faintly contributed, Donald, Rubio, Kasick.
On the right side of the plot, Benjamin was steadily contributed but there are not big amount of money. He might be supported by blue collar as well as Sanders. As I mentioned before I don’t know politics well. Evertyhing is just as I see in the plot. Otherwise, Clinton and Rubio sometimes received high amount of money might be supported by white collar.

Description Three.
Top ten cities have most of people so that it could reflect the index of AL state candidate support. Birmingham, Huntsville, Mobile, Montgomery. I could say Republican candidates are more popular than democrat in AL and sanders and Ted supported the most in the big cities but the contributed amount almost same as other candidates such as Benjamin, Hillary.
Reflection
Looking through 2016 election in Alabama. There were 23645 object and 23 columns including candidate name, contributor name, city, employer, contribution amount, date,..I do not strongly insist that the ratio of the contribution amount not related to the candidate popularity because there is always exceptional candidate like Trump but generally, it is true. There are too many cities in Alabama even though population concentrated into five cities. Birmingham, Huntville, Mobile, Montgomery.. So, I could think the candidate win in those big city will take up the alabama.
Those plots above show that people supported their candidate end of every month when they have enough money I think. I am not sure when retired people got their 401k monthly salary but should be on end of month and the amount of contribution is getting increased when it gets closer to the election. I could see there are some stripes on the contribution amount $3,000, $1,000, $700, $500, $300.
The candidates could be separated by three groups. Popular and received small amount contribution from many people like Ted Cruz, Sanders, benjamin. Popular and received big amount money like, Hillary, Marco Rubio. Not popular but received big amount of contribution like, Jeb Bush.
There are contribution trend. It could be one way to think of candidate popularity. Some candidate contribution frequency is getting increased such like Ted Cruz, Hillary, Sanders when it close to the election, some didn’t at some point like Rubio, Benjamin.
Three people left for the finalized election so far as going through the data plot, Hillary, Sanders, Ted Cruz. But in reality, There is Donald Trump and not the Ted Cruz. So, It is not easy to say that the contribution amount not directly connected to the popularity even though Donal Drump is a special case, the richest guy in USA no need contribution. If I could have a chance to analyze the whole USA data, It could be more interesting than just Alabama but still satisfied with the result and it was fun.